5 research outputs found

    Mixed-precision deep learning based on computational memory

    Full text link
    Deep neural networks (DNNs) have revolutionized the field of artificial intelligence and have achieved unprecedented success in cognitive tasks such as image and speech recognition. Training of large DNNs, however, is computationally intensive and this has motivated the search for novel computing architectures targeting this application. A computational memory unit with nanoscale resistive memory devices organized in crossbar arrays could store the synaptic weights in their conductance states and perform the expensive weighted summations in place in a non-von Neumann manner. However, updating the conductance states in a reliable manner during the weight update process is a fundamental challenge that limits the training accuracy of such an implementation. Here, we propose a mixed-precision architecture that combines a computational memory unit performing the weighted summations and imprecise conductance updates with a digital processing unit that accumulates the weight updates in high precision. A combined hardware/software training experiment of a multilayer perceptron based on the proposed architecture using a phase-change memory (PCM) array achieves 97.73% test accuracy on the task of classifying handwritten digits (based on the MNIST dataset), within 0.6% of the software baseline. The architecture is further evaluated using accurate behavioral models of PCM on a wide class of networks, namely convolutional neural networks, long-short-term-memory networks, and generative-adversarial networks. Accuracies comparable to those of floating-point implementations are achieved without being constrained by the non-idealities associated with the PCM devices. A system-level study demonstrates 173x improvement in energy efficiency of the architecture when used for training a multilayer perceptron compared with a dedicated fully digital 32-bit implementation

    Computational Memory Design

    No full text

    A Multi-Memristive Unit-Cell Array With Diagonal Interconnects for In-Memory Computing

    No full text
    Memristive crossbar arrays can be used to realize matrix-vector multiplication (MVM) operations in constant time complexity by exploiting the Kirchhoff's circuit laws. This is enabled by the parallel read of the entire array in a single time step. However, parallel writing is prohibitive in such arrays due to limitations on the current that could be accumulated along the wires. Hence, loading the matrix elements into such an array still incurs significant time penalty. Another key challenge is the achievable computational precision. To overcome these challenges, we propose a unit-cell array design where each unit-cell comprises four memristive devices each attached to a selection transistor. Moreover, the array is organized in such a way that the selection transistors can be turned on in a diagonal fashion. We experimentally demonstrated this concept by fabricating a 2 x 2 unit-cell array based on projected phase-change memory (PCM) devices in 90 nm CMOS technology. It is shown that using the diagonal connections, the write operations can be parallelized while maintaining the current limit of the back-end-of-the-line metallization. Moreover, the increase in write time due to having more devices per unit-cell is minimized through a combination of single-shot and iterative programming schemes. Finally, we present experimental results on MVM operations that demonstrate improved computational precision exceeding that of a 4-bit fixed-point implementation.ISSN:1549-7747ISSN:1057-7130ISSN:1558-3791ISSN:1558-125

    Experimental Efficiency Evaluation of Stacked Transistor Half-Bridge Topologies in 14 nm CMOS Technology

    No full text
    Different Half-Bridge (HB) converter topologies for an Integrated Voltage Regulator (IVR), which serves as a microprocessor application, were evaluated. The HB circuits were implemented with Stacked Transistors (HBSTs) in a cutting-edge 14 nm CMOS technology node in order to enable the integration on the microprocessor die. Compared to a conventional realization of the HBST, it was found that the Active Neutral-Point Clamped (ANPC) HBST topology with Independent Clamp Switches (ICSs) not only ensured balanced blocking voltages across the series-connected transistors, but also featured a more robust operation and achieved higher efficiencies at high output currents. The IVR achieved a maximum efficiency of 85.3% at an output current of 300 mA and a switching frequency of 50 MHz. At the maximum measured output current of 780 mA, the efficiency was 83.1%. The active part of the IVR (power switches, gate-drivers, and level shifters) realized a high maximum current density of 24.7 A/mm2.ISSN:2079-929
    corecore